23 research outputs found

    Dynamic Detection and Mitigation of DMA Races in MPSoCs

    Get PDF
    International audience—Explicitly managed memories have emerged as a good alternative for multicore processors design in order to reduce energy and performance costs. Memory transfers then rely on Direct Memory Access (DMA) engines which provide a hardware support for accelerating data. However, programming explicit data transfers is very challenging for developers who must manually orchestrate data movements through the memory hierarchy. This is in practice very error-prone and can easily lead to memory inconsistency. In this paper, we propose a runtime approach for monitoring DMA races. The monitor acts as a safeguard for programmers and is able to enforce at runtime a correct behavior w.r.t the semantics of the program execution. We validate the approach using traces extracted from industrial benchmarks and executed on the multiprocessor system-on-chip platform STHORM. Our experiments demonstrate that the monitoring algorithm has a low overhead (less than 1.5 KB) of on-chip memory consumption and an overhead of less than 2% of additional execution time. I. INTRODUCTION In recent Multiprocessor Systems-on-Chip (MPSoCs) design , a combination of Scratchpad Memories (SPM) [1] and Direct Memory Access (DMA) engines have been proposed as an alternative to traditional caches, where data (and sometimes code) transfers through the memory hierarchy are explicitly managed by the software. This is promising in terms of performance, energy, and silicon area. However the price to pay is clearly programming complexity, since the program-mer/software has a disjoint view of the different levels of memories and must manually orchestrate data movements using explicit DMA operations. In this context, DMA races emerge as one of the regular issues programmers have to face

    Bridging the Gap between Resilient Networks-on-Chip and Real-Time Systems

    Get PDF
    Conventional fault-tolerance approaches for Networks-on-Chip (NoCs) cannot be applied to high dependability systems due to their different goals and constraints. These systems impose strict integrity, resilience and real-time requirements. In order to meet these requirements, all possible effects of random hardware errors must be taken into account, silent data corruption must be prevented and the resulting system must be predictable in the presence of errors. In this paper, we present a wormhole-switched NoC with virtual channels for high dependability systems hardened against soft errors. The NoC is developed based on results of a Failure Mode and Effects Analysis. It efficiently handles errors in different network layers and operates with formal guarantees. Our experimental evaluation, including an industrial avionics use case, shows that the network is able to achieve predictable behavior even in aggressive environments with very high error rates while presenting competitive overheads

    Temporal-Based Intrusion Detection for IoV

    Get PDF
    The Internet of Vehicle (IoV) is an extension of Vehicle-to-Vehicle (V2V) communication that can improve vehicles’ fully autonomous driving capabilities. However, these communications are vulnerable to many attacks. Therefore, it is critical to provide run-time mechanisms to detect malware and stop the attackers before they manage to gain a foothold in the system. Anomaly-based detection techniques are convenient and capable of detecting off-nominal behavior by the component caused by zero-day attacks. One significant critical aspect when using anomaly-based techniques is ensuring the correct definition of the observed component’s normal behavior. In this paper, we propose using the task’s temporal specification as a baseline to define its normal behavior and identify temporal thresholds that give the system the ability to predict malicious tasks. By applying our solution on one use-case, we got temporal thresholds 20–40 % less than the one usually used to alarm the system about security violations. Using our boundaries ensures the early detection of off-nominal temporal behavior and provides the system with a sufficient amount of time to initiate recovery actions

    The Road towards Predictable Automotive High-Performance Platforms

    Get PDF
    Due to the trends of centralizing the E/E architecture and new computing-intensive applications, high-performance hardware platforms are currently finding their way into automotive systems. However, the SoCs currently available on the market have significant weaknesses when it comes to providing predictable performance for time-critical applications. The main reason for this is that these platforms are optimized for averagecase performance. This shortcoming represents one major risk in the development of current and future automotive systems. In this paper we describe how high-performance and predictability could (and should) be reconciled in future HW/SW platforms. We believe that this goal can only be reached in a close collaboration between system suppliers, IP providers, semiconductor companies, and OS/hypervisor vendors. Furthermore, academic input will be needed to solve remaining challenges and to further improve initial solutions

    RDMA-Based Deterministic Communication Architecture for Autonomous Driving

    Get PDF
    Autonomous driving is a big challenge for nextgeneration vehicles and requires multiple computationallyintensive deep neural networks (DNNs) to be implemented on distributed automotive platforms. Distributed software—enabling autonomous functionalities—has strict timing requirements, e.g., low and deterministic end-to-end latency. Such timings rely on the communication technologies used in the automotive platform, as much on the computation performance of CPUs, GPUs, TPUs, and FPGAs. Hence, we advocate the use of Remote Direct Memory Access (RDMA) technology—typically used in data centers—in automotive platforms. As shown by our experiments with real hardware, Soft-RoCE (software implementation of RDMA) offers low latency communication because of minimal CPU involvement and reduced memory copies. Simultaneously, we show that the native implementation of RDMA does not support determinism, i.e., there is a high variation in communication delays in the presence of interfering data packets. To mitigate this issue, we propose a multi-layer communication stack comprising a deterministic scheduler on top of the SoftRoCE layer. Further, we have developed a C++ library that offers easy-to-use communication interfaces for distributed applications while implementing the proposed architecture. Experiments show that our library (i) reduces the end-to-end latency of distributed object detection by nearly 9% while having an implementation overhead of less than 1.5% and (ii) minimizes the effects of other data traffic on the delay in high-priority communication

    Optimizing Data Transfers for Multiprocessor Systems on Chips

    No full text
    Les systèmes multiprocesseurs sur puce, tel que le processeur CELL ou plus récemment Platform 2012, sont des architectures multicœurs hétérogènes constitués d'un processeur host et d'une fabric de calcul qui consiste en plusieurs petits cœurs dont le rôle est d'agir comme un accélérateur programmable. Les parties parallélisable d'une application, qui initialement est supposé etre executé par le host, et dont le calcul est intensif sont envoyés a la fabric multicœurs pour être exécutés. Ces applications sont en général des applications qui manipulent des tableaux trés larges de données, ces données sont stockées dans une memoire distante hors puce (off-chip memory) dont l 'accès est 100 fois plus lent que l 'accès par un cœur a une mémoire locale. Accéder ces données dans la mémoire off-chip devient donc un problème majeur pour les performances. une characteristiques principale de ces plateformes est une mémoire local géré par le software, au lieu d un mechanisme de cache, tel que les mouvements de données dans la hiérarchie mémoire sont explicitement gérés par le software. Dans cette thèse, l 'objectif est d'optimiser ces transfert de données dans le but de reduire/cacher la latence de la mémoire off-chip .Multiprocessor system on chip (MPSoC) such as the CELL processor or the more recent Platform2012 are heterogeneous multi-core architectures, with a powerful host processor and a computation fabric, consisting of several smaller cores, whose intended role is to act as a general purpose programmable accelerator. Therefore computation-intensive (and parallelizable) parts of the application initially intended to be executed by the host processor are offloaded to the multi-cores for execution. These parts of the application are often data intensive, operating on large arrays of data initially stored in a remote off-chip memory whose access time is about 100 times slower than that of the cores local memory. Accessing data in the off-chip memory becomes then a main bottleneck for performance. A major characteristic of these platforms is a software controlled local memory storage rather than a hidden cache mechanism where data movement in the memory hierarchy, typically performed using a DMA (Direct Memory Access) engine, are explicitely managed by the software. In this thesis, we attempt to optimize such data transfers in order to reduce/hide the off-chip memory latency

    Optimisation des transferts de données sur systèmes multiprocesseurs sur puce

    No full text
    Multiprocessor system on chip (MPSoC) such as the CELL processor or the more recent Platform2012 are heterogeneous multi-core architectures, with a powerful host processor and a computation fabric, consisting of several smaller cores, whose intended role is to act as a general purpose programmable accelerator. Therefore computation-intensive (and parallelizable) parts of the application initially intended to be executed by the host processor are offloaded to the multi-cores for execution. These parts of the application are often data intensive, operating on large arrays of data initially stored in a remote off-chip memory whose access time is about 100 times slower than that of the cores local memory. Accessing data in the off-chip memory becomes then a main bottleneck for performance. A major characteristic of these platforms is a software controlled local memory storage rather than a hidden cache mechanism where data movement in the memory hierarchy, typically performed using a DMA (Direct Memory Access) engine, are explicitely managed by the software. In this thesis, we attempt to optimize such data transfers in order to reduce/hide the off-chip memory latency.Les systèmes multiprocesseurs sur puce, tel que le processeur CELL ou plus récemment Platform 2012, sont des architectures multicœurs hétérogènes constitués d'un processeur host et d'une fabric de calcul qui consiste en plusieurs petits cœurs dont le rôle est d'agir comme un accélérateur programmable. Les parties parallélisable d'une application, qui initialement est supposé etre executé par le host, et dont le calcul est intensif sont envoyés a la fabric multicœurs pour être exécutés. Ces applications sont en général des applications qui manipulent des tableaux trés larges de données, ces données sont stockées dans une memoire distante hors puce (off-chip memory) dont l 'accès est 100 fois plus lent que l 'accès par un cœur a une mémoire locale. Accéder ces données dans la mémoire off-chip devient donc un problème majeur pour les performances. une characteristiques principale de ces plateformes est une mémoire local géré par le software, au lieu d un mechanisme de cache, tel que les mouvements de données dans la hiérarchie mémoire sont explicitement gérés par le software. Dans cette thèse, l 'objectif est d'optimiser ces transfert de données dans le but de reduire/cacher la latence de la mémoire off-chip

    OASIcs, Volume 68, ASD\u2719, Complete Volume

    No full text
    OASIcs, Volume 68, ASD\u2719, Complete Volum

    Optimisation des transferts de données sur systèmes multiprocesseurs sur puce

    No full text
    Les systèmes multiprocesseurs sur puce, tel que le processeur CELL ou plus récemment Platform 2012, sont des architectures multicœurs hétérogènes constitués d'un processeur host et d'une fabric de calcul qui consiste en plusieurs petits cœurs dont le rôle est d'agir comme un accélérateur programmable. Les parties parallélisable d'une application, qui initialement est supposé etre executé par le host, et dont le calcul est intensif sont envoyés a la fabric multicœurs pour être exécutés. Ces applications sont en général des applications qui manipulent des tableaux trés larges de données, ces données sont stockées dans une memoire distante hors puce (off-chip memory) dont l 'accès est 100 fois plus lent que l 'accès par un cœur a une mémoire locale. Accéder ces données dans la mémoire off-chip devient donc un problème majeur pour les performances. une characteristiques principale de ces plateformes est une mémoire local géré par le software, au lieu d un mechanisme de cache, tel que les mouvements de données dans la hiérarchie mémoire sont explicitement gérés par le software. Dans cette thèse, l 'objectif est d'optimiser ces transfert de données dans le but de reduire/cacher la latence de la mémoire off-chip .Multiprocessor system on chip (MPSoC) such as the CELL processor or the more recent Platform2012 are heterogeneous multi-core architectures, with a powerful host processor and a computation fabric, consisting of several smaller cores, whose intended role is to act as a general purpose programmable accelerator. Therefore computation-intensive (and parallelizable) parts of the application initially intended to be executed by the host processor are offloaded to the multi-cores for execution. These parts of the application are often data intensive, operating on large arrays of data initially stored in a remote off-chip memory whose access time is about 100 times slower than that of the cores local memory. Accessing data in the off-chip memory becomes then a main bottleneck for performance. A major characteristic of these platforms is a software controlled local memory storage rather than a hidden cache mechanism where data movement in the memory hierarchy, typically performed using a DMA (Direct Memory Access) engine, are explicitely managed by the software. In this thesis, we attempt to optimize such data transfers in order to reduce/hide the off-chip memory latency.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF

    OASIcs, Volume 68, ASD\u2719, Complete Volume

    No full text
    OASIcs, Volume 68, ASD\u2719, Complete Volum
    corecore